Yeah.
Welcome back.
The there we go.
We've been kind of wrapping up decision theories, decision theory for making decisions by taking
time into account in partially observable environments and the technique is called POMDPs.
Remember the kind of idea was we're using Markov processes to model the world and we've
talked extensively about that and now we kind of do the decision theory for that and that
gives us these MDPs, Markov decision problems, which is just basically maximizing expected
utility of actions.
And the interesting thing here is that we can no longer in partially observable Markov
decision procedures, we cannot rely anymore on policies.
Policies don't help us because we don't know where we are.
Where a policy is something that takes each state and tells you what to do optimally.
If we don't know where we are, that's useless.
So we need to do something else and that's what we're slowly but surely developing.
We were kind of in the middle.
Why aren't you doing anything?
Because something is...
There we go.
And to go partially observable, we've updated our example with a new feature, namely that
we have a noisy sensor.
Instead of having a fully observable environment where we just know where we are, where we
can observe the state, we have a sensor that gives us or counts the adjacent walls, which
alone is only partial observability.
We can only observe the walls, so we might not be able to...
We cannot distinguish between this state and that state.
They have the same number of adjacent walls.
One has a wall left and right, the other one has left and top, or north and west.
And to make things even worse, we're going to add noise.
Okay?
So that is typically what we see is a sensor model and we want to have an observation model
or sensor model, whatever you want to call it, but we want to assume the sensor Markov
property and that we have that being stationary, so the sensor doesn't change over time in
its characteristics, but we still don't know where we are anymore.
Okay?
So the question is, how do we deal with this?
And we've been relying on the theorem by Elstrom that we have that the optimal policy in a
POMDP is a function not on the states, right?
A policy is something that tells you this is a state, do that, but here it's a function
that is on the belief state.
So we're lifting everything up one level.
Instead of having single states, we have belief states, which are probability distribution
over states.
So the idea that we're going to pursue and we have pursued is to basically convert a
POMDP into an MDP in belief space, and the idea there is that we can use MDP methods
because the belief state is fully observable.
Why?
Why?
Because, pardon me?
So the point is here that you can always, you know what you believe, right?
Presenters
Zugänglich über
Offener Zugang
Dauer
01:23:59 Min
Aufnahmedatum
2025-06-03
Hochgeladen am
2025-06-04 12:59:11
Sprache
en-US